PyDigger - unearthing stuff about Python


NameVersionSummarydate
kreuzberg 3.9.1 Document intelligence framework for Python - Extract text, metadata, and structured data from diverse file formats 2025-07-29 15:54:53
docling-analysis-framework 1.1.0 AI-ready analysis framework for PDF and Office documents using Docling for content extraction 2025-07-29 14:34:10
xml-analysis-framework 1.3.0 XML document analysis and preprocessing framework designed for AI/ML data pipelines 2025-07-29 14:32:08
document-data-extractor 1.0.4 Best open-source document to markdown extractor for LLM training data. Convert PDF, Word, PowerPoint, Excel, images, URLs to clean markdown, JSON, HTML locally. Alternative to Unstructured, Docling, Marker, MarkItDown, MinerU, PaddleOCR, Tesseract 2025-07-29 08:25:56
qdrant-loader 0.5.1 A tool for collecting and vectorizing technical content from multiple sources and storing it in a QDrant vector database. 2025-07-29 06:41:31
contextgem 0.12.1 Effortless LLM extraction from documents 2025-07-27 20:11:08
aikitx 1.0.0 A comprehensive GUI toolkit for Large Language Models (LLMs) with GGUF support, document processing, email automation, and multi-backend inference 2025-07-25 19:44:31
llm-data-converter 2.2.0 Best open-source document to markdown converter for LLM training data. Convert PDF, Word, PowerPoint, Excel, images, URLs to clean markdown, JSON, HTML locally. Alternative to Unstructured, Docling, Marker, MarkItDown, MinerU, PaddleOCR, Tesseract 2025-07-25 13:32:07
llm-text-splitter 0.2.0 A lightweight, rule-based text splitter for LLM context window management, handles multiple file formats and enriches chunks with metadata. 2025-07-24 12:21:01
mseep-kreuzberg 3.8.2 Document intelligence framework for Python - Extract text, metadata, and structured data from diverse file formats 2025-07-17 03:32:28
pdf-splitter-cli 0.1.1 A modern command-line tool to split PDF files into smaller chunks with progress bars and automatic filename generation 2025-07-17 01:37:12
pdf-ocr-processor 2.0.3 Advanced PDF OCR processing with AI-powered text extraction and selectable text overlays 2025-07-11 21:11:24
ai-chunking 0.1.4 A powerful Python library for semantic document chunking and enrichment using AI 2025-03-16 20:44:19
atai-pdf-tool 0.1.0 A tool for parsing and extracting text from PDF files with OCR capabilities 2025-02-27 11:15:46
smart-llm-loader 0.1.0 A powerful PDF processing toolkit that seamlessly integrates with LLMs for intelligent document chunking and RAG applications. Features smart context-aware segmentation, multi-LLM support, and optimized content extraction for enhanced RAG performance. 2025-02-14 12:42:55
fileseek 0.1.3 FileSeek – AI-Powered Local Document Archive&Search 2025-02-08 07:13:54
tikara 0.1.5 The metadata and text content extractor for almost every file type. 2025-01-26 23:33:40
peslac 0.1.4 A Python package for the Peslac API 2025-01-25 06:54:20
aimq 0.1.0 A robust message queue processor for Supabase pgmq with AI-powered document processing capabilities 2025-01-18 22:17:05
pdf-parser-header-footer 0.1.0 A Python package for processing PDFs with header and footer detection 2025-01-14 16:10:34
hourdayweektotal
80226810354304111
Elapsed time: 8.91965s